Traffic Noise Analysis in New York City - 2023

Introduction

Urban noise pollution is a growing concern for city residents, impacting both physical and mental health. In bustling cities like New York, noise generated from traffic is one of the major contributors to environmental pollution. The high density of road networks and heavy traffic, particularly during peak hours, leads to elevated noise levels that affect the quality of life of those living near busy roads.

The purpose of this analysis is to investigate the relationship between traffic volume and noise complaints in New York City during the year 2023. By leveraging geospatial data and statistical analysis, this project aims to determine whether high traffic volumes are directly correlated with increased noise complaints and, if so, to what extent. We will also explore the spatial distribution of traffic and noise complaints to identify potential clusters or patterns across different areas of the city.

Objectives

  1. Data Exploration: We will first explore and clean the datasets obtained, including traffic volume, noise complaints, and road network data.
  2. Data Analysis: The cleaned data will be analyzed to identify patterns and relationships between traffic volumes and noise complaints. Special attention will be given to clustering tendencies.
  3. Spatial Analysis: Using geospatial tools, we will visualize the data to highlight problem areas where noise complaints are highest, possibly correlating them with traffic intensity.
  4. Hypothesis Testing: Finally, formal statistical testing will be conducted to evaluate the hypothesis that increased traffic volume leads to an increase in noise complaints during peak times.

The Hypothesis I proposed is: Areas with higher traffic volumes have a higher intensity (density) of noise complaints in New York City during 2023.

The Null Hypothesis is: There is no spatial relationship between traffic volume and the intensity of noise complaints; any observed patterns are due to random chance.

The Alternative Hypothesis is: There is a significant spatial relationship between traffic volume and noise complaints; areas with higher traffic volumes have a higher density of noise complaints.

Data Sources

  1. Traffic Volume Data (traffic_volume_counts.csv): Contains traffic count data across various road segments in New York City.
  2. Noise Complaint Data (Noise Complaint.csv): Includes historical records of noise complaints made to NYC’s 311 service from 2010 to today. We will filter this dataset to include only data from 2023.
  3. Road Network Data (road_map.geojson): Provides geospatial information about the road network in New York City, including road types and locations.

Approach

We will begin with data cleaning to ensure we are working with accurate and relevant information. This includes filtering the noise complaint data for 2023 and cleaning the traffic volume data as needed. Next, we will conduct exploratory data analysis (EDA) to understand the overall trends in traffic and noise complaints. Visualizations and spatial analyses will help us identify areas of concern. Lastly, we will conduct formal statistical analysis, including spatial autocorrelation, to assess the relationship between traffic volume and noise complaints.

Progress

# Package Loading
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(stringr)
library(sf)
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(ggplot2)
library(mapview)
library(leaflet)
library(spatstat)
## Loading required package: spatstat.data
## Loading required package: spatstat.univar
## spatstat.univar 3.1-1
## Loading required package: spatstat.geom
## spatstat.geom 3.3-4
## Loading required package: spatstat.random
## spatstat.random 3.3-2
## Loading required package: spatstat.explore
## Loading required package: nlme
## 
## Attaching package: 'nlme'
## 
## The following object is masked from 'package:dplyr':
## 
##     collapse
## 
## spatstat.explore 3.3-3
## Loading required package: spatstat.model
## Loading required package: rpart
## spatstat.model 3.3-3
## Loading required package: spatstat.linnet
## spatstat.linnet 3.2-3
## 
## spatstat 3.3-0 
## For an introduction to spatstat, type 'beginner'
library(raster)
## Loading required package: sp
## 
## Attaching package: 'raster'
## 
## The following object is masked from 'package:nlme':
## 
##     getData
## 
## The following object is masked from 'package:dplyr':
## 
##     select
library(dplyr)

Step 1: Data Cleaning and Pre-processing

noise_data <- read_csv("data/Noise_Complaint.csv")
## Rows: 51256 Columns: 40
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (25): Created Date, Closed Date, Agency, Agency Name, Complaint Type, D...
## dbl   (6): Unique Key, Incident Zip, X Coordinate (State Plane), Y Coordinat...
## lgl   (8): Facility Type, Due Date, Taxi Company Borough, Taxi Pick Up Locat...
## dttm  (1): created_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(noise_data)
## # A tibble: 6 × 40
##   `Unique Key` `Created Date`         `Closed Date`         Agency `Agency Name`
##          <dbl> <chr>                  <chr>                 <chr>  <chr>        
## 1     56412499 01/01/2023 07:51:18 PM 01/01/2023 08:19:51 … NYPD   New York Cit…
## 2     56412502 01/01/2023 06:03:27 PM 01/01/2023 06:13:28 … NYPD   New York Cit…
## 3     56412517 01/01/2023 05:29:51 PM 01/01/2023 05:59:09 … NYPD   New York Cit…
## 4     56412522 01/01/2023 10:48:24 PM 01/02/2023 12:30:43 … NYPD   New York Cit…
## 5     56412584 01/01/2023 05:24:31 PM 01/02/2023 04:22:22 … NYPD   New York Cit…
## 6     56412597 01/01/2023 11:53:48 PM 01/02/2023 12:02:56 … NYPD   New York Cit…
## # ℹ 35 more variables: `Complaint Type` <chr>, Descriptor <chr>,
## #   `Location Type` <chr>, `Incident Zip` <dbl>, `Incident Address` <chr>,
## #   `Street Name` <chr>, `Cross Street 1` <chr>, `Cross Street 2` <chr>,
## #   `Intersection Street 1` <chr>, `Intersection Street 2` <chr>,
## #   `Address Type` <chr>, City <chr>, Landmark <chr>, `Facility Type` <lgl>,
## #   Status <chr>, `Due Date` <lgl>, `Resolution Description` <chr>,
## #   `Resolution Action Updated Date` <chr>, `Community Board` <chr>, …
glimpse(noise_data)
## Rows: 51,256
## Columns: 40
## $ `Unique Key`                     <dbl> 56412499, 56412502, 56412517, 5641252…
## $ `Created Date`                   <chr> "01/01/2023 07:51:18 PM", "01/01/2023…
## $ `Closed Date`                    <chr> "01/01/2023 08:19:51 PM", "01/01/2023…
## $ Agency                           <chr> "NYPD", "NYPD", "NYPD", "NYPD", "NYPD…
## $ `Agency Name`                    <chr> "New York City Police Department", "N…
## $ `Complaint Type`                 <chr> "Noise - Vehicle", "Noise - Vehicle",…
## $ Descriptor                       <chr> "Engine Idling", "Car/Truck Horn", "C…
## $ `Location Type`                  <chr> "Street/Sidewalk", "Street/Sidewalk",…
## $ `Incident Zip`                   <dbl> 11420, 11218, 11218, 10040, 10453, 10…
## $ `Incident Address`               <chr> "124-15 OLD SOUTH ROAD", "CHURCH AVEN…
## $ `Street Name`                    <chr> "OLD SOUTH ROAD", "CHURCH AVENUE", "C…
## $ `Cross Street 1`                 <chr> "124 STREET", "CHURCH AVENUE", "CHURC…
## $ `Cross Street 2`                 <chr> "125 STREET", "EAST    4 STREET", "EA…
## $ `Intersection Street 1`          <chr> "124 STREET", "CHURCH AVENUE", "CHURC…
## $ `Intersection Street 2`          <chr> "125 STREET", "EAST    4 STREET", "EA…
## $ `Address Type`                   <chr> "ADDRESS", "INTERSECTION", "INTERSECT…
## $ City                             <chr> "SOUTH OZONE PARK", NA, NA, "NEW YORK…
## $ Landmark                         <chr> "OLD SOUTH ROAD", NA, NA, "BOGARDUS P…
## $ `Facility Type`                  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Status                           <chr> "Closed", "Closed", "Closed", "Closed…
## $ `Due Date`                       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Resolution Description`         <chr> "The Police Department responded to t…
## $ `Resolution Action Updated Date` <chr> "01/01/2023 08:19:54 PM", "01/01/2023…
## $ `Community Board`                <chr> "10 QUEENS", "12 BROOKLYN", "12 BROOK…
## $ Borough                          <chr> "QUEENS", "BROOKLYN", "BROOKLYN", "MA…
## $ `X Coordinate (State Plane)`     <dbl> 1035124, 990750, 990507, 1003671, 101…
## $ `Y Coordinate (State Plane)`     <dbl> 181529, 173905, 173771, 252190, 25017…
## $ `Park Facility Name`             <chr> "Unspecified", "Unspecified", "Unspec…
## $ `Park Borough`                   <chr> "QUEENS", "BROOKLYN", "BROOKLYN", "MA…
## $ `Vehicle Type`                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Taxi Company Borough`           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Taxi Pick Up Location`          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Bridge Highway Name`            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Bridge Highway Direction`       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Road Ramp`                      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Bridge Highway Segment`         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Latitude                         <dbl> 40.66479, 40.64400, 40.64364, 40.8588…
## $ Longitude                        <dbl> -73.81662, -73.97658, -73.97745, -73.…
## $ Location                         <chr> "(40.66478573741414, -73.816622087387…
## $ created_date                     <dttm> 2023-01-01 19:51:18, 2023-01-01 18:0…
# Load traffic volume data
traffic_data <- read_csv("data/traffic_volume_counts.csv")
## Rows: 1712605 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Boro, WktGeom, street, fromSt, toSt, Direction
## dbl (8): RequestID, Yr, M, D, HH, MM, Vol, SegmentID
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Filter for 2023 data, create datetime, and drop old columns
traffic_data_2023 <- traffic_data %>%
  filter(Yr == 2023) %>%
  mutate(datetime = make_datetime(Yr, M, D, HH, MM)) %>%
  dplyr::select(-Yr, -M, -D, -HH, -MM)

# Create a location description and drop unnecessary columns
traffic_data_2023 <- traffic_data_2023 %>%
  mutate(location_desc = str_c(street, " from ", fromSt, " to ", toSt, sep = "")) %>%
  dplyr::select(-street, -fromSt, -toSt)

# Aggregate traffic volume by hour and location
traffic_data_aggregated <- traffic_data_2023 %>%
  mutate(hour = floor_date(datetime, unit = "hour")) %>%
  group_by(location_desc, hour, Boro, Direction, WktGeom) %>%
  summarise(total_volume = sum(Vol, na.rm = TRUE), .groups = "drop")

Filtering and Pre-processing Noise Data

vehicle_noise_data_2023 <- noise_data %>%
  mutate(
    location_desc = str_to_upper(str_trim(`Incident Address`)),
    hour = floor_date(created_date, unit = "hour")
  ) %>%
  filter(!is.na(location_desc) & !is.na(hour)) %>%
  dplyr::select(
    -`Unique Key`, -`Created Date`, -`Closed Date`, -`Agency`, -`Agency Name`,
    -`Vehicle Type`, -`Taxi Company Borough`, -`Taxi Pick Up Location`,
    -`Bridge Highway Name`, -`Bridge Highway Direction`, -`Road Ramp`, -`Bridge Highway Segment`,
    -`Facility Type`, -`Resolution Action Updated Date`, -`Due Date`, -`Resolution Description`,
    -`Community Board`, -`X Coordinate (State Plane)`, -`Y Coordinate (State Plane)`, -`Park Facility Name`,
    -`Park Borough`, -`Complaint Type`, -`Descriptor`, -`Location Type`, -`Address Type`, -`Status`,
    -`Street Name`, -`Cross Street 1`, -`Cross Street 2`, -`Intersection Street 1`, -`Intersection Street 2`,
    -`City`, -`Landmark`, -`created_date`, -`Incident Address`
  ) %>%
  rename(location = location_desc) %>%
  dplyr::select(location, hour, everything()) %>%
  filter(!is.na(Latitude) & !is.na(Longitude))

Step 2: Data Integration and Merging

Since the traffic volume data does not have separate latitude and longitude columns but instead contains route descriptions, and the vehicle noise data does have coordinates, we can follow through with a geospatial join using a proximity-based approach.

Convert Traffic Volume Data to an sf Object and convert Noise Complaint Data to Spatial Data Frame

# Rename and convert traffic data to sf object
traffic_data_aggregated <- traffic_data_aggregated %>%
  rename(Borough = Boro)

traffic_data_aggregated_sf <- traffic_data_aggregated %>%
  st_as_sf(wkt = "WktGeom", crs = 4326)

# Convert vehicle noise data to sf object
vehicle_noise_data_2023_sf <- vehicle_noise_data_2023 %>%
  st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326)

# Spatial join to link noise data with traffic data
vehicle_noise_with_traffic <- st_join(vehicle_noise_data_2023_sf, traffic_data_aggregated_sf, join = st_nearest_feature)
## Warning in st_is_longlat(x): bounding box has potentially an invalid value
## range for longlat data
# Convert the result to a tibble to avoid select issues
merged_data <- as_tibble(vehicle_noise_with_traffic)  # Convert to tibble

# Clean up columns and rename for consistency
merged_data <- merged_data %>%
  dplyr::select(-Borough.y, -hour.y) %>%
  rename(Boro = Borough.x)

Step 3: General Summary Stats and EDAs

str(merged_data)
## tibble [50,448 × 9] (S3: tbl_df/tbl/data.frame)
##  $ location     : chr [1:50448] "124-15 OLD SOUTH ROAD" "CHURCH AVENUE" "CHURCH AVENUE" "1 BOGARDUS PLACE" ...
##  $ hour.x       : POSIXct[1:50448], format: "2023-01-01 19:00:00" "2023-01-01 18:00:00" ...
##  $ Incident Zip : num [1:50448] 11420 11218 11218 10040 10453 ...
##  $ Boro         : chr [1:50448] "QUEENS" "BROOKLYN" "BROOKLYN" "MANHATTAN" ...
##  $ Location     : chr [1:50448] "(40.66478573741414, -73.81662208738771)" "(40.64400329864003, -73.97657773481212)" "(40.64363567249765, -73.9774534914305)" "(40.85885672421463, -73.92979179302147)" ...
##  $ geometry     :sfc_POINT of length 50448; first list element:  'XY' num [1:2] -73.8 40.7
##  $ location_desc: chr [1:50448] "VALENTINE AVENUE from East 178 Street to East Burnside Avenue" "VALENTINE AVENUE from East 178 Street to East Burnside Avenue" "VALENTINE AVENUE from East 178 Street to East Burnside Avenue" "VALENTINE AVENUE from East 178 Street to East Burnside Avenue" ...
##  $ Direction    : chr [1:50448] "NB" "NB" "NB" "NB" ...
##  $ total_volume : num [1:50448] 266 266 266 266 266 266 266 266 266 266 ...
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA
##   ..- attr(*, "names")= chr [1:10] "location" "hour.x" "Incident Zip" "Borough.x" ...
summary(merged_data)
##    location             hour.x                       Incident Zip  
##  Length:50448       Min.   :2023-01-01 00:00:00.0   Min.   :10000  
##  Class :character   1st Qu.:2023-04-16 19:00:00.0   1st Qu.:10039  
##  Mode  :character   Median :2023-06-25 00:00:00.0   Median :10468  
##                     Mean   :2023-06-30 12:12:56.9   Mean   :10736  
##                     3rd Qu.:2023-09-14 23:00:00.0   3rd Qu.:11232  
##                     Max.   :2023-12-31 23:00:00.0   Max.   :11694  
##                                                     NA's   :1      
##      Boro             Location                  geometry     location_desc     
##  Length:50448       Length:50448       POINT        :50448   Length:50448      
##  Class :character   Class :character   epsg:4326    :    0   Class :character  
##  Mode  :character   Mode  :character   +proj=long...:    0   Mode  :character  
##                                                                                
##                                                                                
##                                                                                
##                                                                                
##   Direction          total_volume
##  Length:50448       Min.   :266  
##  Class :character   1st Qu.:266  
##  Mode  :character   Median :266  
##                     Mean   :266  
##                     3rd Qu.:266  
##                     Max.   :266  
## 

Summary Statistics for Key Variables

borough_summary <- merged_data %>%
  group_by(Boro) %>%
  summarise(total_complaints = n(), .groups = "drop")

print(borough_summary)
## # A tibble: 6 × 2
##   Boro          total_complaints
##   <chr>                    <int>
## 1 BRONX                    11756
## 2 BROOKLYN                 12383
## 3 MANHATTAN                13702
## 4 QUEENS                   11674
## 5 STATEN ISLAND              931
## 6 Unspecified                  2
# Summary of complaints by hour of the day
merged_data <- merged_data %>%
  mutate(hour = hour(hour.x))

hourly_summary <- merged_data %>%
  group_by(hour) %>%
  summarise(total_complaints = n(), .groups = "drop")

print(hourly_summary)
## # A tibble: 24 × 2
##     hour total_complaints
##    <int>            <int>
##  1     0             4050
##  2     1             2634
##  3     2             1698
##  4     3             1259
##  5     4             1195
##  6     5              844
##  7     6              786
##  8     7              789
##  9     8              904
## 10     9              888
## # ℹ 14 more rows

Noise Complaints by Borough

ggplot(merged_data, aes(x = Boro, fill = Boro)) +
  geom_bar() +
  labs(
    title = "Total Number of Noise Complaints by Borough",
    x = "Borough",
    y = "Number of Complaints"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Noise Complaints by Hour of the Day

# Plot the number of noise complaints by hour of the day
ggplot(merged_data, aes(x = hour)) +
  geom_bar(fill = "blue", alpha = 0.6) +
  labs(
    title = "Noise Complaints by Hour of the Day",
    x = "Hour of the Day",
    y = "Number of Complaints"
  ) +
  theme_minimal() +
  scale_x_continuous(breaks = seq(0, 23, by = 1)) +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 14)
  )

Spatial Distribution of Noise Complaints

mapview(vehicle_noise_data_2023_sf, zcol = "Borough")

Step 4: Assessing the Veracity of the Hypothesis Using Formal Hypothesis Evaluation Approaches

Step 4.1: Observed Intensity Function

Traffic volume data in New York City

traffic_data_aggregated <- traffic_data_aggregated %>%
  mutate(WktGeom = as.character(WktGeom))  
traffic_data_aggregated_sf <- st_as_sf(traffic_data_aggregated, wkt = "WktGeom", crs = 2263)
traffic_data_aggregated_sf <- st_transform(traffic_data_aggregated_sf, 4326)
print(head(st_geometry(traffic_data_aggregated_sf), 5))  
## Geometry set for 5 features 
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -73.773 ymin: 40.68121 xmax: -73.773 ymax: 40.68121
## Geodetic CRS:  WGS 84
## POINT (-73.773 40.68121)
## POINT (-73.773 40.68121)
## POINT (-73.773 40.68121)
## POINT (-73.773 40.68121)
## POINT (-73.773 40.68121)
leaflet() %>%
  addTiles() %>%
  addCircleMarkers(data = traffic_data_aggregated_sf,
                   radius = ~log(total_volume + 1), 
                   color = "green",
                   fillOpacity = 0.5,
                   group = "Traffic Volume",
                   popup = ~paste("Traffic Volume:", total_volume, "<br>Borough:", Borough, "<br>Location:", location_desc)) %>%
  addLayersControl(
    overlayGroups = c("Traffic Volume"),
    options = layersControlOptions(collapsed = FALSE)
  )

Interactive Visualization of the KDE

vehicle_noise_data_2023_sf <- st_transform(vehicle_noise_data_2023_sf, 2263)

coords <- st_coordinates(vehicle_noise_data_2023_sf)
win <- owin(xrange = range(coords[, 1]), yrange = range(coords[, 2]))
noise_ppp <- ppp(x = coords[, 1], y = coords[, 2], window = win)
## Warning: data contain duplicated points
intensity_noise <- density(noise_ppp, sigma = 1000)
kde_df <- as.data.frame(intensity_noise)
kde_raster <- rasterFromXYZ(kde_df[, c("x", "y", "value")])
crs(kde_raster) <- CRS("+init=epsg:2263")
## Warning in CPL_crs_from_input(x): GDAL Message 1: +init=epsg:XXXX syntax is
## deprecated. It might return a CRS with a non-EPSG compliant axis order.
kde_raster_wgs84 <- projectRaster(kde_raster, crs = CRS("+init=epsg:4326"))
traffic_data_aggregated_sf <- st_transform(traffic_data_aggregated_sf, 4326)
leaflet() %>%
  addTiles() %>%
  addRasterImage(kde_raster_wgs84, 
                 colors = colorNumeric(palette = c("blue", "yellow", "red"), domain = values(kde_raster_wgs84), na.color = "transparent"),
                 opacity = 0.5,
                 group = "Noise Complaint Density") %>%
  addLayersControl(
    overlayGroups = c("Noise Complaint Density"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  
  # Add legends for better interpretation
  addLegend(pal = colorNumeric(palette = c("blue", "yellow", "red"), domain = values(kde_raster_wgs84)),
            values = values(kde_raster_wgs84),
            title = "Noise Complaint Density",
            position = "bottomright")

Compute Observed Intensity Function Using Kernel Density Estimation (KDE)

coords <- st_coordinates(vehicle_noise_data_2023_sf)
win <- owin(xrange = range(coords[, 1]), yrange = range(coords[, 2]))
noise_ppp <- ppp(x = coords[, 1], y = coords[, 2], window = win)
## Warning: data contain duplicated points
intensity_noise <- density(noise_ppp, sigma = 500)
plot(intensity_noise, main = "Observed Intensity Function of Noise Complaints (2023)", col = viridis::viridis(30))

Kernel Density Estimation (KDE) of Noise Complaints

road_map <- st_read("data/road_map.geojson")
## Reading layer `road_map' from data source 
##   `/Users/huoxingrui/Desktop/6750/Final Project/data/road_map.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 117191 features and 49 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: -74.25521 ymin: 40.49786 xmax: -73.70002 ymax: 40.9151
## Geodetic CRS:  WGS 84
road_map <- st_transform(road_map, 2263)  
ggplot() +
  geom_sf(data = road_map, fill = "grey90", color = "grey70", size = 0.3) +
  geom_raster(data = kde_df, aes(x = x, y = y, fill = value), alpha = 0.6) +
  scale_fill_viridis_c(name = "Complaint Intensity") +
  theme_minimal() +
  labs(
    title = "Kernel Density Estimation of Noise Complaints in NYC (2023)",
    x = "Longitude",
    y = "Latitude"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 14)
  )

Step 4.2: Monte Carlo Simulations of Null Hypothesis

Monte Carlo Simulations of the Null Hypothesis

simulations <- replicate(999, rpoispp(lambda = intensity(noise_ppp), win = win))

Observed Intensity vs Simulated Patterns

ggplot() +
  geom_sf(data = road_map, fill = "grey90", color = "grey70", size = 0.3) +
  geom_raster(data = kde_df, aes(x = x, y = y, fill = value), alpha = 0.6) +
  scale_fill_viridis_c(name = "Complaint Intensity") +
  geom_point(data = as.data.frame(simulations[[1]]), aes(x = x, y = y), color = "black", alpha = 0.3, size = 0.5) +
  theme_minimal() +
  labs(
    title = "Observed Intensity vs Simulated CSR Pattern",
    x = "Longitude",
    y = "Latitude"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 14)
  )

Overlay Simulated Patterns on Observed Intensity Function

plot(intensity_noise, main = "Observed Noise Complaint Intensity vs Simulated Patterns")
for (i in 1:5) {
  sim_coords <- as.data.frame(simulations[[i]])
  points(sim_coords$x, sim_coords$y, col = "red", cex = 0.5, pch = ".")
}

Step 4.4: Compute Ripley’s K-Function for Spatial Patterns

Compute Ripley’s K-Function

K_noise <- Kest(noise_ppp)
## number of data points exceeds 3000 - computing border correction estimate only
env <- envelope(noise_ppp, Kest, nsim = 10, rank = 1)
## Generating 10 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 
## 10.
## 
## Done.
plot(env, main = "Ripley's K-Function for Noise Complaints (Observed vs Null)")

The observed K-function (K_obs) lies significantly above the theoretical K-function (K_theo) at all distances. This indicates that there are more noise complaints than expected under complete randomness, which implies clustering.

The observed curve also exceeds the upper envelope (K_hi), especially at larger distances. This means that the clustering is stronger than what would be expected under random conditions, suggesting a statistically significant non-random spatial pattern in the noise complaints.

Since K(r) increases with distance and is greater than the null, we can conclude that noise complaints tend to cluster spatially.

Step 5: Initial Conclusion

Our analysis of Ripley’s (K(r))-Function shows that noise complaints in New York City during 2023 are heavily clustered. The observed (K(r))-Function ((K_{obs})) stays well above the theoretical (K(r))-Function ((K_{theo})) at all distances. This means there are more noise complaints than we’d expect if they were randomly distributed, confirming the presence of clustering.

On top of that, the observed curve also goes above the upper confidence envelope ((K_{hi})), especially at larger distances. This makes it clear that the clustering effect is not just random—it’s statistically significant. Since (K(r)) keeps increasing with distance, it’s obvious that noise complaints aren’t evenly spread out across the city but tend to group together in certain areas.

What This Means

  1. Noise is Localized
    The clustering suggests that noise complaints are likely caused by specific local factors, such as heavy traffic, construction zones, or highly populated neighborhoods.

  2. Patterns Depend on Space
    It also indicates that noise complaints are influenced by what’s nearby. For instance, complaints might be higher near busy roads or big intersections.

  3. Link to Traffic Volume
    From the visualizations we’ve done so far, it looks like there might be a connection between traffic volume and noise complaints. High traffic areas seem to match up with places where there are more complaints. However, we’ll need to dig deeper with statistical tests to confirm this.

What Could Be Improved

  1. Edge Effects
    Since we’re only looking at NYC, areas near the edges of the map might affect the results. This could make us overestimate or underestimate clustering at certain distances.

  2. Time Patterns
    Right now, we’re only looking at yearly data. Breaking it down by time (like rush hours or seasons) might reveal even more about when and where noise complaints happen.

  3. Data Matching
    The traffic and noise data don’t perfectly line up spatially. This might make the results a little less accurate than they could be.

Next Steps

Here’s what we can do to build on these findings:

  • Run spatial regression models to measure how much traffic volume affects noise complaints.
  • Use cross (K(r))-Functions to directly check the relationship between traffic and noise.
  • Analyze the data by time to see if patterns change throughout the day or year.
  • Make sure the spatial alignment of traffic and noise data is as accurate as possible.

Wrapping It Up

In short, noise complaints in NYC aren’t random—they’re clearly clustered, and local factors like traffic volume might play a big role. While our early results suggest a link between traffic and noise, we need more analysis to confirm and quantify it. These findings could help city planners and policymakers better address noise issues and improve life for New Yorkers.

Step 6: Road Map

To further explore and refine the findings from Step 5, here are the key next steps:

1. Strengthen the Spatial Relationship Analysis

  • Run Spatial Regression Models
    Conduct spatial regression analyses, such as spatial lag or error models, to quantify the relationship between traffic volume and noise complaints. This will help identify how much traffic contributes to the clustering of complaints.
  • Apply Cross (K(r))-Functions
    Use cross (K(r))-Functions to directly measure the spatial relationship between traffic volume and noise complaints. This will add depth to the clustering analysis by linking the two datasets.

2. Incorporate Temporal Analysis

  • Break Down by Time of Day or Week
    Analyze patterns during rush hours, weekends, or different seasons to see how the clustering of noise complaints varies over time.
  • Visualize Temporal Trends
    Create time-based heatmaps or animations to illustrate how noise complaint density shifts throughout the day or year.

3. Refine Data Integration

  • Improve Spatial Alignment
    Ensure that traffic and noise data are aligned at the finest possible spatial resolution, such as census tracts or grid cells. This will reduce uncertainties and improve the reliability of the results.
  • Explore Additional Variables
    Incorporate other environmental or socioeconomic data, like population density or road types, to better understand the underlying causes of noise complaints.

4. Conduct Sensitivity Analysis

  • Test how varying parameters in the kernel density estimation (KDE) or Monte Carlo simulations affect the results. This will help confirm the robustness of the conclusions drawn so far.

5. Extend Visualizations

  • Combine Maps
    Create overlay maps showing both traffic volume and noise complaint intensity for direct visual comparison.
  • Interactive Dashboards
    Build interactive maps or dashboards to allow users to explore the relationship between traffic and noise complaints dynamically.

6. Policy Recommendations

  • Based on refined findings, suggest actionable steps for city planners, such as reducing traffic in noise hotspots or implementing better noise barriers near high-traffic zones.

7. Report and Dissemination

  • Finalize the Quarto document by integrating all findings, visualizations, and analyses into a cohesive report. Ensure that the conclusions are backed by evidence and clearly communicate the implications.

Long-Term Considerations

  • Investigate whether similar patterns are present in other cities or regions.
  • Expand the dataset to include other types of noise complaints, such as those related to construction or public events, to generalize findings beyond traffic-related noise.

By following these steps, we can deepen the understanding of the relationship between traffic volume and noise complaints in NYC and refine the insights for practical urban planning applications.